Eigen tips
1. Install OpenBLAS
Install gfortran first.
- Download
openblas 0.3.20
tar -xzvf openblas*
cd openblas
make USE_OPENMP=1
,if openmp is not used, set 0.make install
The output of make install
shows that Openblas in install in /opt/OpenBLAS
make -j 16 -f Makefile.install install
make[1]: Entering directory '/home/hxhc/Downloads/OpenBLAS-0.3.20'
Generating openblas_config.h in /opt/OpenBLAS/include
Generating f77blas.h in /opt/OpenBLAS/include
Generating cblas.h in /opt/OpenBLAS/include
Copying LAPACKE header files to /opt/OpenBLAS/include
Copying the static library to /opt/OpenBLAS/lib
Copying the shared library to /opt/OpenBLAS/lib
Generating openblas.pc in /opt/OpenBLAS/lib/pkgconfig
Generating OpenBLASConfig.cmake in /opt/OpenBLAS/lib/cmake/openblas
Generating OpenBLASConfigVersion.cmake in /opt/OpenBLAS/lib/cmake/openblas
Install OK!
It takes about
2. Install Intel OneMKL using conda
- Use
mamba install -c intel mkl-devel==2022.0.1
to install the mkl library with the specific version. And other dependencies will also be installed.
Lib | Version | Size |
---|---|---|
intel-openmp | 2022.0.1 intel_3633 | 8 MB |
mkl | 2022.0.1 intel_117 | 199 MB |
mkl-devel | 2022.0.1 intel_117 | 15 KB |
mkl-include | 2022.0.1 intel_117 | 726 KB |
tbb | 2021.5.0 intel_707 | 1 MB |
- add
export MKLROOT=/home/hxhc/miniconda3
to . bashrc or . zshrc
3. Eigen performance test
3.1 Compile
- MKL linking
The link command can be referenced on Intel MKL Link Line Advisor
> g++ eigen_performance.cpp -o ./output/eigen_performance -m64 -I ./libs -I"${MKLROOT}/include" -L${MKLROOT}/lib -Wl,-rpath=${MKLROOT}/lib -lmkl_intel_lp64 -lmkl_gnu_thread -lmkl_core -lgomp -lpthread -lm -ldl -march=native -O3
Note that there is no space in -Wl,-rpath=${MKLROOT}/lib
- Openblas linking
> g++ eigen_performance.cpp -o ./output/eigen_performance -I ./libs -I/opt/OpenBLAS/include -L/opt/OpenBLAS/lib -Wl,-rpath=/opt/OpenBLAS/lib -lopenblas -march=native -O3
3.2 Matrix multiplication
#define EIGEN_USE_BLAS
#define EIGEN_USE_LAPACK
#define EIGEN_USE_MKL_ALL
#include <iostream>
#include <chrono>
#include <Eigen/Dense>
#include <EigenRand/EigenRand>
#include <vector>
#include <fstream>
Eigen::ArrayXd RunMatMultTest(int size=100, size_t loops=10)
{
Eigen::Rand::Vmt19937_64 urng{uint (size)};
Eigen::Rand::StdNormalGen<double> stdnorm;
Eigen::ArrayXd duration_list(loops);
Eigen::MatrixXd A = stdnorm.generate<Eigen::MatrixXd>(size, size, urng);
for (size_t i = 0; i < loops; ++i)
{
auto t0 = std::chrono::steady_clock::now();
Eigen::MatrixXd B = A * A;
auto t1 = std::chrono::steady_clock::now();
std::chrono::duration<double> duration = t1 - t0;
duration_list[i] = duration.count();
}
return duration_list;
}
int main(){
int size = 100;
std::cout << "please entr the size of the suqare matrix" << std::endl;
std::cin >> size;
int loops = 10;
// auto t0 = std::chrono::steady_clock::now();
Eigen::ArrayXd duration_list = RunMatMultTest(size, loops);
// auto t1 = std::chrono::steady_clock::now();
// auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(t1 - t0);
// std::cout << duration.count()/loops <<" ms\n";
std::cout << duration_list << "\n";
std::cout <<"The mean time is: ";
std::cout << duration_list.mean() << "\n";
std::cin.get();
return 0;
}
Linked to MKL, it takes about
3.3 Eigenvalue decomposition
Eigen::ArrayXd RunEigDecompTest(int size=100, int loops=10)
{
Eigen::Rand::Vmt19937_64 urng{uint (size)};
Eigen::Rand::StdNormalGen<double> stdnorm;
Eigen::ArrayXd duration_list(loops);
Eigen::MatrixXd A = stdnorm.generate<Eigen::MatrixXd>(size, size, urng);
Eigen::MatrixXd COV = A*A.transpose();
for (int i = 0; i < loops; ++i)
{
auto t0 = std::chrono::steady_clock::now();
Eigen::SelfAdjointEigenSolver<Eigen::MatrixXd> eigensolver(A);
Eigen::MatrixXcd B = eigensolver.eigenvalues();
auto t1 = std::chrono::steady_clock::now();
std::chrono::duration<double> duration = t1 - t0;
duration_list[i] = duration.count();
}
return duration_list;
}
Linked to MKL, it takes about
Linked to MKL, it takes about
Linked to MKL, it takes about
Note that there are different EigenSolvers implemented by Eigen. In the code above, the SelfAdjointEigenSolver
is used for Hermitian (self-adjoint) matrix, which includes the covariance matrix in real applications. While the SelfAdjointEigenSolver
is usually very fast. The corresponding versions of SelfAdjointEigenSolver
in Numpy and Xtensor are np.linalg.eigh()
and xt::linalg::eigh()
, respectively. But the Eige implementation is slower than Xtensor.
However, for a general EigenSolver (EigenSolver
), the speed is slower. For
3.4 Slice view
The codes are referenced from Roman Poya
template <typename T>
T finite_difference_block_implMatrixXd &u, int num
{
Eigen::MatrixXd u_old = u;
u.block(1, 1, num - 2, num - 2) =
((u_old.block(0, 1, num - 2, num - 2) + u_old.block(2, 1, num - 2, num - 2) +
u_old.block(1, 0, num - 2, num - 2) + u_old.block(1, 2, num - 2, num - 2)) *
4.0 +
u_old.block(0, 0, num - 2, num - 2) + u_old.block(0, 2, num - 2, num - 2) +
u_old.block(2, 0, num - 2, num - 2) + u_old.block(2, 2, num - 2, num - 2)) /
20.0;
return (u - u_old).norm();
}
Eigen::ArrayXd RunFiniteDiffTest(int size=100, int loops=10)
{
Eigen::Rand::Vmt19937_64 urng{uint (size)};
Eigen::Rand::StdNormalGen<double> stdnorm;
Eigen::ArrayXd duration_list(loops);
Eigen::MatrixXd A = stdnorm.generate<Eigen::MatrixXd>(size, size, urng);
for (int i = 0; i < loops; ++i)
{
auto t0 = std::chrono::steady_clock::now();
double b = finite_difference_block_impl<double>(A, size);
auto t1 = std::chrono::steady_clock::now();
std::chrono::duration<double> duration = t1 - t0;
duration_list[i] = duration.count();
}
return duration_list;
}
It takes about
It takes about
It takes about
The results shows that Eigen
performs much faster than Xtensor.
3.5 QR factorization
Note that Eigen provides several QR methods, including HouseholderQR, colPivHouseholderQR, and fullPivHouseholderQR. The first one runs the fastest, but the numerical stability is the worst. And the last one is the slowest, but the accuracy is guaranteed. Generally, we can choose the colPivHouseholderQR for a trade-off.
On the other hand, it seems that numpy only provides the Householder version.
Eigen::ArrayXd RunQRTest(int size=100, int loops=10)
{
Eigen::Rand::Vmt19937_64 urng{uint (size)};
Eigen::Rand::StdNormalGen<double> stdnorm;
Eigen::ArrayXd duration_list(loops);
Eigen::MatrixXd A = stdnorm.generate<Eigen::MatrixXd>(size, size, urng);
for (int i = 0; i < loops; ++i)
{
auto t0 = std::chrono::steady_clock::now();
Eigen::HouseholderQR<Eigen::MatrixXd> QR(A);
Eigen::MatrixXd result = QR.matrixQR();
auto t1 = std::chrono::steady_clock::now();
std::chrono::duration<double> duration = t1 - t0;
duration_list[i] = duration.count();
}
return duration_list;
}
It takes about
It takes about
It takes about
Random number using EigenRand
- Use EigenRand to generate normal distributions in loops
for (int i = 0; i < 10; ++i) {
std::random_device device;
Eigen::Rand::Vmt19937_64 urng{ device() };
Eigen::Rand::NormalGen<double> norm_gen{100, 7.6};
Eigen::ArrayXd s1 = norm_gen.generate<Eigen::ArrayXd>(10, 1, urng);
Eigen::ArrayXd s2 = norm_gen.generate<Eigen::ArrayXd>(20, 1, urng);
}
Load csv file
#include <Eigen/Core>
#include <fstream>
#include <vector>
template <typename EigenType>
EigenType load_csv(const std::string& file_path) {
std::ifstream file;
file.open(file_path);
if (!file.is_open()) {
throw std::runtime_error("Could not open file");
}
std::string line;
std::vector<double> values;
uint64_t rows = 0;
while getline(file, line) {
std::stringstream line_stream(line);
std::string cell;
while getline(line_stream, cell, ',') {
values.push_backstod(cell);
}
++rows;
}
file.close();
return Eigen::Map<const Eigen::Matrix<typename EigenType::Scalar, EigenType::RowsAtCompileTime, EigenType::ColsAtCompileTime, Eigen::RowMajor>>(values.data(), rows, values.size() / rows);
}